Cost Estimation Techniques for Database Systems

نویسنده

  • Ashraf Aboulnaga
چکیده

This dissertation is about developing advanced selectivity and cost estimation techniques for query optimization in database systems. It addresses the following three issues related to current trends in database research: estimating the cost of spatial selections, building histograms without looking at data, and estimating the selectivity of XML path expressions. The first part of this dissertation deals with estimating the cost of spatial selections, or window queries, where the query windows and the data objects are general polygons. Previously proposed cost estimation techniques only handle rectangular query windows over rectangular data objects, thus ignoring the significant cost of exact geometry comparison (the refinement step in a “filter and refine” query processing strategy). The cost of the exact geometry comparison depends on the selectivity of the filtering step and the average number of vertices in the candidate objects identified by this step. We develop a cost model for spatial selections that takes these parameters into account. We also introduce a new type of histogram for spatial data that captures the size, location, and number of vertices of the spatial objects. Capturing these attributes makes this type of histogram useful for accurate cost estimation using our cost model, as we experimentally demonstrate. The second part of the dissertation introduces self-tuning histograms. While similar in structure to traditional histograms, self-tuning histograms are built not by examining the data or a sample thereof, but by using feedback from the query execution engine about the selectivities of range selections on the histogram attributes to progressively refine the histogram. Since self-tuning histograms have a low up-front cost and the cost of building them is independent of the data size, they are an attractive alternative to traditional histograms, especially multidimensional histograms. The low cost of self-tuning histograms can help a self-tuning selfadministering database system experiment with building many different histograms on many different combinations of data columns. This is useful since the system cannot rely on a database administrator to decide which histograms to build.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Neural Networks with Limited Data to Estimate Manufacturing Cost

Neural networks were used to estimate the cost of jet engine components, specifically shafts and cases. The neural network process was compared with results produced by the current conventional cost estimation software and linear regression methods. Due to the complex nature of the parts and the limited amount of information available, data expansion techniques such as doubling-data and data-cr...

متن کامل

Tire Inflation Pressure Estimation Using Identification Techniques

In this research study, one of the most crucial automotive engineering problems is intended to be solved. The necessity of tire pressure monitoring system is beyond doubt. Such systems are now provided relying on expensive sensors. In this study an indirect tire pressure monitoring system is proposed, utilizing identification techniques, which will reduce the cost of monitoring considerably in ...

متن کامل

Query Result Size Estimation Techniques in Database Systems

Query optimisers are critical to the efficiency of modern relational database systems. If a query optimiser chooses a poor query execution plan, the performance of the database system in answering the query can be very poor. In fact, the differences in cost between the least and most expensive query execution plans can be several orders of magnitude. On the other hand, it can be prohibitively e...

متن کامل

Wavelet-Based Cost Estimation for Spatial Queries

Query cost estimation is an important and well-studied problem in relational database systems. In this paper we study the cost estimation problem in the context of spatial database systems. We introduce a new method that provides accurate cost estimation for spatial selections, or window queries, by building wavelet-based histograms for spatial data. Our method is based upon two novel technique...

متن کامل

Cost Modeling and Range Estimation for Top-k Retrieval in Relational Databases

Relational databases have increasingly become the basis for a wide range of applications that require efficient methods for exploratory search and retrieval. Top-k retrieval addresses this need and involves finding a limited number of records whose attribute values are the closest to those specified in a query. One of the approaches in the recent literature is query-mapping which deals with con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002